RANDOM PROJECTIONS Margin-constrained Random Projections And Very Sparse Random Projections
نویسندگان
چکیده
Abstract We1 propose methods for improving both the accuracy and efficiency of random projections, the popular dimension reduction technique in machine learning and data mining, particularly useful for estimating pairwise distances. Let A ∈ Rn×D be our n points in D dimensions. This method multiplies A by a random matrix R ∈ RD×k, reducing the D dimensions down to just k . R typically consists of i.i.d. entries in N(0, 1). The cost of the projection mapping is O(nDk). This study proposes an improved estimator of pairwise distances with provably smaller variances (errors) by taking advantage of the marginal information. We also propose very sparse random projections by replacing the N(0, 1) entries in R with entries in {−1, 0, 1} with probabilities { 1 2 √ D , 1− 1 √ D , 1 2 √ D }, for achieving a significant √ D-fold speedup, with little loss in accuracy. Previously, Achlioptas proposed sparse random projections by using entries in {−1, 0, 1} with probabilities { 1 6 , 2 3 , 1 6 }, achieving a threefold speedup.
منابع مشابه
Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections
The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...
متن کاملRandom Projections for Anchor-based Topic Inference
Recent spectral topic discovery methods are extremely fast at processing large document corpora, but scale poorly with the size of the input vocabulary. Random projections are vital to ensure speed and limit memory usage. We empirically evaluate several methods for generating random projections and measure the effect of parameters such as sparsity and dimensionality. We find that methods with s...
متن کاملSparse signal recovery using sparse random projections
Sparse signal recovery using sparse random projections
متن کاملMemory and Computation Efficient PCA via Very Sparse Random Projections
Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously effic...
متن کاملConditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data
Abstract We1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006